hysop.backend.device.opencl.autotunable_kernels.transpose module

class hysop.backend.device.opencl.autotunable_kernels.transpose.OpenClAutotunableTransposeKernel(cl_env, typegen, build_opts, autotuner_config, **kwds)[source]

Bases: OpenClAutotunableKernel

Autotunable interface for transpose kernel code generators.

autotune(is_inplace, input_buffer, output_buffer, axes, hardcode_arrays, name=None, **kwds)[source]

Autotune this kernel with specified axes, inputs and outputs.

compute_args_mapping(extra_kwds, extra_parameters)[source]

Return arguments mapping which is a dictionnary with arguments names as keys and tuples a values.

Tuples should contain (arg_position, arg_type(s)) with arg_position being an int and arg_type(s) a type or tuple of types which will be checked against.

compute_global_work_size(local_work_size, work, extra_parameters, extra_kwds)[source]

Compute aligned global_work_size from unaligned global_work_size and local_work_size. Input global_work_size may be None.

compute_parameters(extra_kwds)[source]

Register extra parameters to optimize.

compute_work_candidates(work_bounds, work_load, extra_parameters, extra_kwds)[source]

Configure work (global_size, local_size candidates) given a OpenClWorkBoundsConfiguration object and a work_load.

Return a WorkConfiguration object.

Notes

global_work_size can be ignored if it depends on local_work_size and will be set in self.compute_global_work_size().

generate_kernel_src(global_work_size, local_work_size, extra_parameters, extra_kwds, tuning_mode, dry_run, force_verbose=False, force_debug=False, return_codegen=False)[source]

Generate kernel name and source code.

hash_extra_kwds(extra_kwds)[source]

Hash extra_kwds dictionnary for caching purposes.